Many datasets are biased, namely they contain easy-to-learn features that are highly correlated with the target class only in the dataset but not in the true underlying distribution of the data. For this reason, learning unbiased models from biased data has become a very relevant research topic in the last years. In this work, we tackle the problem of learning representations that are robust to biases. We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses (InfoNCE, SupCon, etc.) can fail when dealing with biased data. Based on that, we derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples. Furthermore, thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data. We validate the proposed losses on standard vision datasets including CIFAR10, CIFAR100, and ImageNet, and we assess the debiasing capability of FairKL with epsilon-SupInfoNCE, reaching state-of-the-art performance on a number of biased datasets, including real instances of biases in the wild.
translated by 谷歌翻译
当前的对比学习方法使用从大量转换列表(固定的超参数)中采样的随机转换来从未经注释的数据库中学习不变性。遵循以前引入少量监督的作品,我们提出了一个框架,以找到使用可区分转换网络的对比度学习的最佳转换。我们的方法在监督准确性和收敛速度方面都在低注释的数据制度下提高了性能。与以前的工作相反,转换优化不需要生成模型。转换的图像保留相关信息以解决监督任务,此处分类。在34000 2D切片的大脑磁共振图像和11200胸X射线图像上进行实验。在两个数据集(具有标记数据的10%)上,我们的模型比具有100%标签的完全监督模型获得了更好的性能。
translated by 谷歌翻译
通常认为CNN能够使用有关其接收领域内不同对象(例如其定向关系)的上下文信息。但是,这种能力的性质和限制从未得到充分探索。我们使用经过训练的标准U-NET探索特定类型的关系〜-定向〜-,以优化分割的跨透镜损失函数。我们按照借口细分任务训练该网络,需要取得成功的方向关系推理,并指出,凭借足够的数据和足够大的接收领域,它成功地学习了所提出的任务。我们进一步探讨了网络通过分析方向关系受到干扰的方案,并表明网络已经学会了使用这些关系来推理。
translated by 谷歌翻译
对比学习在自然和医学图像上表现出令人印象深刻的结果,而无需注释数据。然而,医学图像的特殊性是可以利用学习陈述的元数据(如年龄或性别)的可用性。在这里,我们表明最近提出的对比y-Aware缺陷,即集成了多维元数据,渐近优化了两个属性:条件对准和全局均匀性。类似地到[王,2020],条件对准意味着类似的样本应该具有类似的特征,而是有条件地对象数据。相反,全局均匀性意味着(归一化)特征应独立于元数据的单位超球均匀分布。在这里,我们建议定义条件均匀性,依赖于元数据,该数据仅排斥具有不同元数据的样本。我们表明,在CIFAR-100和脑MRI数据集中,有条件对准和均匀性的直接优化改善了线性评估的表示。
translated by 谷歌翻译
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译
Artificial neural networks can learn complex, salient data features to achieve a given task. On the opposite end of the spectrum, mathematically grounded methods such as topological data analysis allow users to design analysis pipelines fully aware of data constraints and symmetries. We introduce a class of persistence-based neural network layers. Persistence-based layers allow the users to easily inject knowledge about symmetries (equivariance) respected by the data, are equipped with learnable weights, and can be composed with state-of-the-art neural architectures.
translated by 谷歌翻译
Quantifying motion in 3D is important for studying the behavior of humans and other animals, but manual pose annotations are expensive and time-consuming to obtain. Self-supervised keypoint discovery is a promising strategy for estimating 3D poses without annotations. However, current keypoint discovery approaches commonly process single 2D views and do not operate in the 3D space. We propose a new method to perform self-supervised keypoint discovery in 3D from multi-view videos of behaving agents, without any keypoint or bounding box supervision in 2D or 3D. Our method uses an encoder-decoder architecture with a 3D volumetric heatmap, trained to reconstruct spatiotemporal differences across multiple views, in addition to joint length constraints on a learned 3D skeleton of the subject. In this way, we discover keypoints without requiring manual supervision in videos of humans and rats, demonstrating the potential of 3D keypoint discovery for studying behavior.
translated by 谷歌翻译
Artificial intelligence is set to be deployed in operating rooms to improve surgical care. This early-stage clinical evaluation shows the feasibility of concurrently attaining real-time, high-quality predictions from several deep neural networks for endoscopic video analysis deployed for assistance during three laparoscopic cholecystectomies.
translated by 谷歌翻译
AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces several issues. The current practice uses automatic metrics, which compute the textual similarity of generated code with ground-truth references. However, it is not clear what metric to use, and which metric is most suitable for specific contexts. This practical experience report analyzes a large set of output similarity metrics on offensive code generators. We apply the metrics on two state-of-the-art NMT models using two datasets containing offensive assembly and Python code with their descriptions in the English language. We compare the estimates from the automatic metrics with human evaluation and provide practical insights into their strengths and limitations.
translated by 谷歌翻译
Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. In this work, we propose to capture each of these aspects by modeling the surgical scene with a disentangled latent scene graph representation, which we can then process using a graph neural network. Unlike previous approaches using graph representations, we explicitly encode in our graphs semantic information such as object locations and shapes, class probabilities and visual features. We also incorporate an auxiliary image reconstruction objective to help train the latent graph representations. We demonstrate the value of these components through comprehensive ablation studies and achieve state-of-the-art results for critical view of safety prediction across multiple experimental settings.
translated by 谷歌翻译